An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts
نویسندگان
چکیده
Résumé. Nous présentons dans cet article une étude empirique de l’application de l’approche de l’entropie maximale pour l’étiquetage syntaxique de textes vietnamiens. Le vietnamien est une langue qui possède des caractéristiques spéciales qui la distinguent largement des langues occidentales. Notre meilleur étiqueteur explore et inclut des connaissances utiles qui, en terme de performance pour l’étiquetage de textes vietnamiens, fournit un taux de précision globale de 93.40% et de 80.69% pour les mots inconnus sur un ensemble de test du corpus arboré vietnamien. Notre étiqueteur est nettement supérieur à celui qui est en train d’être utilisé pour développer le corpus arboré vietnamien, et à l’heure actuelle c’est le meilleur résultat obtenu pour l’étiquetage de textes vietnamiens.
منابع مشابه
An Ontology-Based Approach for Key Phrase Extraction
Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specif...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملPart-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts
We demonstrate an approach for inducing a tagger for historical languages based on existing resources for their modern varieties. Tags from a Present Day English Bible are projected to a Middle English Bible using multiple alignment approaches and are smoothed with a bigram tagger. Finally, we train a maximum entropy tagger on the output of the bigram tagger on the target text and test it on ta...
متن کاملA Two-Stage Approach to Chinese Part-of-Speech Tagging
This paper describes a Chinese part-ofspeech tagging system based on the maximum entropy model. It presents a novel two-stage approach to using the part-ofspeech tags of the words on both sides of the current word in Chinese part-of-speech tagging. The system is evaluated on four corpora at the Fourth SIGHAN Bakeoff in the close track of the Chinese part-ofspeech tagging task.
متن کاملQualitative and Quantitative Examination of Text Type Readabilities: A Comparative Analysis
This study compared 2 main approaches to readability assessment. Thequantitative approach applied idea density based on part of speech tagging andcompared 3 sets of text types (i.e., narrative, expository, and argumentative) withrespect to their ease of reading. The qualitative approach was done throughdeveloping questionnaires measuring intermediate EFL learners’ perceptions oncontent, motivat...
متن کامل